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Abstract. - We propose a new self-organizing mechanism behind the emergence of memory 
in which temporal sequences of stimuli are transformed into spatial activity patterns. In par- 
ticular, the memory emerges despite the absence of temporal correlations in the stimuli. This 
suggests that neural systems may prepare a spatial structure for processing information before 
the information itself is available. A simple model illustrating the mechanism is presented 
based on three principles: (1) Competition between neural units, (2) Hebbian plasticity, and 
(3) recurrent connections. 



Memory is believed to be a universal feature of the nervous system 0j and exciting results 
improving our understanding of molecular as well as organizational mechanisms underlying 
memory have been obtained in recent decades . On the organizational level significant work 
has been devoted to the study of "brain maps" underlying the ability to recognize patterns 
or features from a given sensory input Many intriguing suggestions have been given as 

to how a memory emerges that is able to extract and recall features from a spatial pattern of 
neural activity (Slpl- 

In this Letter, we focus on the mechanism behind self-organization from a temporal se- 
quence of activity. Time is important in many cognitive tasks, e.g. vision, speech, signal 
processing and motor control. The crucial point is how to represent time, and methods often 
involve time delays in one form or another How does a structured memory emerge 

that can cope with temporal sequences of activity? For example, the information we receive 
through a temporal sequence of input must at least to some extent be memorized spatially 
in the neuronal activity pattern. Here we present a simple conceptual model for the time to 
space transformation, from which a memory emerges. 

The fundamental assumptions of the model presented here are the following: (1) Compe- 
tition between neural units; excited neural units have an inhibiting effect on other units. In 
the limit of strong inhibition this is winner-take-all Q where only the region of units with the 
strongest excitation remains active, suppressing all surrounding units. (2) Hebbian Plastic- 
ity is an abstract formulation of long term potentiation depending on pre- and postsynaptic 
activity: If activity of unit A is followed by activity of unit B the connection from A to 



B is strengthened 1 12,1(1]. (3) Recurrent connectivity opens up the possibility for ongoing 



information processing in the network by internal feedback. 
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Features (1) and (2) mentioned above are employed by the self- organizing map model 
formulated by Kohonen Q . Recently it has been argued [|| that the self-organizing map is a 
biologically plausible large-scale model of cortical information processing. However, the self- 
organizing map has a purely unidirectional information flow without internal dynamics. We 
know of few attempts to explicitly integrate memory of past stimuli into the self-organizing 
map |l3|,[l4|,[l5, 16, 17). These approaches have been shown to work well on specific tasks. 



The scope of the current paper is to investigate generally, i.e. task- independent, the forma- 
tion of an internal dynamics that can lead to formation of memory. In our approach, memory 
is not designed but emerges as a result of the self-organized dynamics of the neural system. 

Consider M neural units arranged as a one-dimensional lattice with periodic boundary con- 
ditions (a ring). The model describes the time-discrete evolution of the real- valued activities 
2/0 (i), ... , yM-i{t) of the units. At a given time step t each unit i receives a recurrent excita- 
tion hl ec (t) = WijUjit—l) through connections my. Additionally there is an ^-dimensional 
input x = (xi (t), . . . , xs(t)) to the system causing an external excitation /i| xt (/j) = ■ VijXj(t) 
through connections v^. The total excitation is /ij(t) = h™ c (t) + /i° xt (i). Next, we define the 
centre of activity i* as the unit with the largest total excitation: i*(t) = argmax^ hi(t). The 
updated unit activities form a Gaussian profile around the centre of activity (we suppress t 
in the notation here) 

/ dist 2 (M*) ^ 

yi = cexpl — I , (1) 

where dist(i, i*) denotes the distance between units i and i* in lattice points. The model 
parameter a is a measure of the width of the neural activity field. The constant value c > 
is chosen such that the normalization J^iilJi) 2 = 1 holds. Finally, all connections are updated 
according to a Hebb-rule with a saturation term. Each recurrent connection Wij is changed 
by 

Awij = r}yi(t) (yj(t -1)-Wij) , (2) 

where 77 > is a constant learning rate. Correspondingly, the increment for the input connec- 
tions is 

Avij = r)yi(t) (xj(t) - Vij) . (3) 

This completes one time step of the dynamics. The learning rate has a value r\ — 0.2 in all 
the simulations presented in the following. The length scale is taken to be a — 1.0. The 
connections iWy and are initialized with random values in the interval [0; 0.001]. 

The memorization ability of the network is the degree to which the state of the network, 
given by i* , depends on the past stimuli. A suitable measure of statistical dependence between 
the two stochastic variables is their mutual information Q . Given a discrete set X of possible 
stimuli, the mutual information between the current centre of activity i*(t) and the stimulus 
x(i — t) presented r time steps before reads 

T, = EE Pr(i,x')log 2 P \ (4) 

Pru* = z ) Pr x = x' 

where p T (i, x') = Pr(i*(t) = i A x(t — r) = x') is the joint distribution of the centre of activity 
and the past stimulus. When estimating the joint probability distribution p T and its marginals 
for a given network at a certain time, the dynamics is sampled over 5000 time steps with 77 = 0. 
Consequently these time steps are not included in the learning time measured. 

Let us now demonstrate the emergence of memory by simulations where the network is 
presented with a random time series. The considered networks have M = 64 units and S = 2 
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inputs. We present only two different orthogonal stimuli, x = (2, 0) and x = (0, 2)}. We use 

and 1 as shorthand for the two stimulus vectors. At each time step one of the vectors and 

1 is drawn randomly with probability p — 0.5. 




time t 

Fig. 1 - The time evolution of the mutual information T T indicates the formation of memory. The 
stream of stimuli x(t) contains 1 bit of information per time step (two different stimuli presented with 
equal probability) . Thus T T = 1 means that the network perfectly remembers the stimulus presented 
t time steps before whereas T T = means statistical independence between the stimulus and the 
network state. The displayed results were obtained as averages over 100 independent simulation runs 
with networks of size M = 64 units. 

Fig. [I] shows the time evolution of the mutual information T T . Originally the state of 
the network depends only on the current stimulus. This means Tq > 0, but T T — for all 
t > 0. After approximately 40 time steps the two stimuli are always discriminated by different 
network states (To = 1). Before step t = 200 we observe the emergence of memory: T\ = 1 
indicates full discrimination between stimuli presented the previous time step. With further 
learning the memory length expands to more time steps, hence T2 > 0, T3 > and so on. 
The maximum information content of the network is bounded by the number M of possible 
states (centres of activity). Thus the condition X)^Lo^'' r — 1°§2 M = 6 causes a saturation in 
the formation of the memory. 

More insight can be gained by considering the geometrical structure of the memory. In 
Fig. |^ we have plotted the evolution of the return-map of the network dynamics for a typical 
simulation run configured as in the previous section. The diagrams are to be interpreted as 
follows: The abscissa is the centre of activity i*(t— 1) in the previous time step. The ordinate 
is the subsequent centre of activity i*(t). Depending on the given stimulus x(t) either the 
filled or the unfilled squares represent the mapping i*(t — 1) — > i*(t). We observe that the 
two branches of the return map tend to become straight lines with slopes 1/2 and -1/2, 
respectively. Panel (f) of FIG. || shows an idealized version for the case of M = 8 units. The 
emerging return map / can be interpreted as the inverse of a tent map where the ambiguity 
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Fig. 2 - Spatial representation of memory, (a)-(e) The return-map of the centre of activity i* after 
0, . . . , 10 5 steps of learning. The diagrams show i*(t) as a function i*(t — 1). The function has two 
branches (filled and unfilled squares) corresponding to the two different values the stimulus x(f) can 
assume, (f) Idealized return-map for a network with M = 8 units. Each unit represents a certain 
history of stimuli. The histories of the units are given as bit strings on the axes. 

of the two branches is resolved by the given stimulus. 

* M * j ~\ Af-1- L**/2J, if x=l (b) 

By [.J we denote the integer part of the argument. In order to understand how the stimuli 
are stored in the network state, it is convenient to write the centre of activity as a binary 
number i* = ^2^-^ 2 k ik ='■ (ii— l, • • • > io), where L = log 2 M denotes the number of bits used. 
Writing also the stimulus x as a binary value x £ {0, 1}, the return map Eq. (0) reads 



fxih-i, ■ ■ ■ , io) = (x, il-i © x, i L ^ 2 © x, . . . , ii x) 



(0) 
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The operation © is the exclusive-or (a©6 = if a = b, otherwise a©6 = 1). Thus the operation 
fo shifts all bits of the argument to the right, discards the least significant bit and inserts x 
as the highest significant bit. f± additionally inverts all bits of the argument. Applying f x 
iteratively r > L times we obtain 



Thus at any time t the values x(t),x(t — 1), . . . , x(t — L + 1) of the L previous stimuli can 
be extracted from i*(t). Note that, due to the non-linear superposition of the stimuli by the 
exclusive-or, the memory effect in general cannot be observed when applying purely linear 
measures. In particular, the linear correlation function between i*(t) and x{t — r) vanishes for 
r > 0. However, using the mutual information T T (Eq. (|j)) one detects the memorization of 
past stimuli in i*(t). For more than two discrete stimuli the emergence of memory is observed 
accordingly, forming a return map with more than two branches. 

We now consider the case of asymmetry in the presentation of stimuli. We use the same two 
stimuli as in the preceeding sections. Unlike before, we admit the probability p of presenting 
stimulus 1 to assume values different from the symmetric case p = 0.5. The amount of 
information per time step in the stream of stimuli is then given by the Shannon function 
S{ P ) = p\og 2 {p) + (1 - p) log 2 (l-p). 

Figure || shows the mutual information as a function of the time-lag r for different values 
of p. For small r the mutual information is close to S(p) for all considered values of p. This 
means that in any case the network almost perfectly memorizes a few preceeding time steps. 
However, varying p causes a redistribution of memory: As the parameter p decreases, the 
decay of T with growing r becomes weaker: The smaller p, the "longer" the memory. Thus 
the neural network automatically adapts to the statistics of the stimuli. 

Again we consider the emerging return map as done before in Fig. ^ for the special case of 
p = 0.5. Lowering p reduces the number of units stimulus 1 is mapped to, thereby increasing 
the number of units stimulus is mapped to. Comparing with Fig. ^(e) the unfilled branch 
of the return map becomes steeper whereas the filled branch becomes flatter. For values of 
P < 0.1 typically a return map as shown in Fig. ^(a) develops. Here one branch of the map is 
a constant (horizontal line), such that after presentation of the infrequent stimulus the centre 
of activity i*(t) does not depend on the previous one i*(t — 1). As illustrated by Fig. |^(b) the 
network state i* passes a transient and reaches an attractor provided persistent presentation 
of the frequent stimulus. The network state is a measure of the time having passed since the 
last presentation of the infrequent stimulus. 

In summary, we have formulated and examined a simple model of memory dynamics based 
on a few asumptions. We have shown that the dynamics based on these assumptions readily 
builds up a structure for systematic storage of recent stimuli. We have also demonstrated the 
adaptation of the memory in reaction to the information contained in the stimuli. Importantly, 
no correlations in the stream of stimuli are required for the structure to emerge. A neural 
network can learn a basic spatial representation of temporal information before temporally 
correlated information itself is available. Noise is enough in order to build up a memory. 
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Fig. 3 - Adaptation to asymmetry in the occurence of the stimuli, as the probability p of presenting 
the stimulus 1 deviates from 0.5. The network adapts to the value of p by varying the memory length. 
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p < 0.5 decreases the information per time step in the stream of stimuli. Then longer memories are 
possible. Each plotted value is an average over 100 independent runs with networks of size M. The 
networks had learnt for 100,000 time steps before mutual information was estimated. 
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Fig. 4 - Typical simulation results for the case that one of the two stimuli occurs very seldomly (here 
with probability p = 0.05). (a) Return map of i* , after 10 6 learning steps. When the seldom stimulus 
x(t) = 1 occurs, i*(t) — 2 becomes the centre of activity (filled horizontal branch). Presentation of 
the other stimulus x = in the subsequent time steps leads to the iteration dynamics indicated by 
the dotted lines, (b) Corresponding time series of the centre of activity i* . 20 time steps after the 
last presentation of stimulus 1 the dynamics reaches a two-cycle. 
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